Give an abstract vision (maybe a diagram) of the knowledge discovery process and qualify it.
What statistical assumptions summed up with three letters are often made on the objects in the dataset? Can you illustrate the "noise" and the "bias"?
What is Berkson's paradox? Give an example.
Many datasets describe objects with attributes. Categorize the simpler attributes w.r.t. their "types" (according to Stevens) and give mathematical operations, statistics, and data mining algorithms requiring each "type".  You can answer in a table with those three columns and one line per type.
What does it mean for a statistic to be robust? Can you give examples of a robust and a non-robust statistic of the centrality of a distribution? Of its dispersion? Of the correlation between two interval-scaled attributes?
Algorithms using similarity measures between the objects usually require a pre-processing. Which one? Can you give a "statistical" way to do it? In this context, why is it problematic to have too many attributes? What about very correlated attributes?
Some algorithms use similarity measures between the objects. Can you give two examples of such algorithms for two different tasks?, an example of a similarity measure that does not relate to a distance and why?
